Analysis of a data set regaurding faculty salaries
## # A tibble: 6 × 17
## FedID UnivName State Tier AvgFu…¹ AvgAs…² AvgAs…³ AvgPr…⁴ AvgFu…⁵ AvgAs…⁶
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1061 Alaska Paci… AK IIB 454 382 362 382 567 485
## 2 1063 Univ.Alaska… AK I 686 560 432 508 914 753
## 3 1065 Univ.Alaska… AK IIA 533 494 329 415 716 663
## 4 11462 Univ.Alaska… AK IIA 612 507 414 498 825 681
## 5 1002 Alabama Agr… AL IIA 442 369 310 350 530 444
## 6 1004 University … AL IIA 441 385 310 388 542 473
## # … with 7 more variables: AvgAssistProfComp <dbl>, AvgProfCompAll <dbl>,
## # NumFullProfs <dbl>, NumAssocProfs <dbl>, NumAssistProfs <dbl>,
## # NumInstructors <dbl>, NumFacultyAll <dbl>, and abbreviated variable names
## # ¹AvgFullProfSalary, ²AvgAssocProfSalary, ³AvgAssistProfSalary,
## # ⁴AvgProfSalaryAll, ⁵AvgFullProfComp, ⁶AvgAssocProfComp
This is not a “tidy” data set, so I can clean it by writing a
function that can be used over and over
## # A tibble: 6 × 14
## fed_id univ_name state tier avg_p…¹ avg_p…² num_i…³ num_f…⁴ rank salary
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 1061 Alaska Pacifi… AK IIB 382 487 4 32 full… 454
## 2 1061 Alaska Pacifi… AK IIB 382 487 4 32 full… 454
## 3 1061 Alaska Pacifi… AK IIB 382 487 4 32 full… 454
## 4 1061 Alaska Pacifi… AK IIB 382 487 4 32 full… 454
## 5 1061 Alaska Pacifi… AK IIB 382 487 4 32 full… 454
## 6 1061 Alaska Pacifi… AK IIB 382 487 4 32 full… 454
## # … with 4 more variables: comp_type <chr>, comp_amt <dbl>, faculty_type <chr>,
## # faculty_count <dbl>, and abbreviated variable names ¹avg_prof_salary_all,
## # ²avg_prof_comp_all, ³num_instructors, ⁴num_faculty_all
This is an ANOVA model which is a linear modeling method to evaluate
the relationships between variables. It can rank the variables based on
their impact on the outcome. We can use tools like this to identify
variables to explore in making changes to our experiments, workflow, or
to make predications for the future.
ANOVA is just one method of modeling. There are countless others
that are readily usable with R studio. It is my job to use the
objectivity of the data software to select the model that best fits each
unique data set.